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provide an empirical example of a construct- centered reliability analysis 
using writing performance assessment data from a large urban school district. 
Writing assessments from 17,330 students in grades 3, 5, and 8 were used in 

the analysis. Scoring guides were developed, centering the constructs 
comprised of dimensions for raters to consider when scoring. Resampling was 
done by randomly selecting three of four percent of the students from the 
assessments before a generalizability study with a fully crossed two-facet 
design was conducted. A large proportion of variance was estimated due to the 
constructs of rhetorical effectiveness and conventions, as well as the 
interaction between constructs and raters . Variance due to rater facet was 
very small. An array of acceptable G coefficients across samples was 
obtained. It is suggested that high reliability or generalizability is 
achievable using a construct -centered reliability approach to identify 
construct relevant variances when there is a high degree of fit between the 
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Abstract 



Performance assessment differs from traditional assessment in that it claims to measure multi- 
traits simultaneously rather than consistently measure single trait. Traditional reliability analyses 
oversimplify issues of generalizability and stability when applied to cognitively complex 
performance assessments. Nichols and Sugrue (1998) recommend that reliability analyses 
incorporate the complex cognitive assumptions so that performance assessments more faithfully 
represent the multidimensional constructs common to performance assessments. More 
specifically, with cognitively complex assessments, reliability or generalizability must be 
reconceptualized to reflect the theoretical expectations of the test developer as well as the 
complex thinking of the test takers (e.g. problem solving, communication of ideas; reasoning etc.) 
(Nichols and Smith, 1998). With a construct-centered reliability analytical approach, the 
reliability analysis should crystallize the multi-traits or constructs that the test specialists 
developed to measure from student performance, and then estimate the degree of fit between the 
theoretical expectations from test developers and the performance exhibited by the students. 

This study attempted to provide an empirical example of a construct-centered reliability 
analysis using writing performance assessment data from a large, urban school district. 
Approximately 31,645 writing assessments taken in English during the spring of 1998 at 6 grade 
levels. Grade 3, 5, and 8 writing assessments were used in the analysis. The data used for this 
analysis contains student responses to one writing prompt for each student. The assessments were 
designed focusing on the constructs underlying different domains tested. Meanwhile, scoring 
guides were developed centering the constructs comprised of dimensions for raters to consider 
when scoring. Raters were trained to follow the construct-centered-scoring guide. Resampling 
was done by randomly selecting 3-4 % of the students involved in the assessments before a 
generalizability study with a fully crossed two-facet design was conducted. Large proportion of 
variance was estimated due to the constructs of rhetorical effectiveness and conventions as well 
as the interaction between constructs and raters. Variance due to rater facet was very small. An 
array of acceptable G coefficients across samples was obtained. It is suggested that high 
reliability or generalizability is achievable using construct-centered reliability analytical approach 
to identify construct relevant variances where there is a high degree of fit between the substantive 
expectations generated from the test specialist’s understanding of the construct-as realized by the 
measurement procedure-and observation (Nichols and Smith, 1998). 
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A Construct-Centered Generalizability Model: Analyzing Underlying Constructs of Cognitively 

Complex Performance Assessments 

Ying Hong Jiang, Long Beach Unified School District 
Philip L. Smith, University of Wisconsin-Milwaukee 

The Methodological Challenge 

Generalizability problems in cognitively complex assessments. Traditional reliability analyses 
oversimplify issues of generalizability and stability when applied to cognitively complex 
performance assessments. To add to the challenge of estimating reliability or generalizability of 
cognitively complex assessments, proponents of both performance and portfolio assessments 
build them to measure multidimensional constructs (Romberg and Wilson, 1995; Wiggins, 1993). 

Unfortunately, a large body of research (Koretz, McCafferey, Klein, Bell, & Stecher, 1992; 
Shavelson, Gao, & Baxter, 1993, etc.) reports low reliability or generalizability indices for 
performance assessments. The multidimensionality of constructs, which we call “cognitive 
complexity” is hinted at in traditional generalizabiltiy analyses. Jiang, Smith and Nichols (1997) 
conducted a meta-analysis of 22 studies using cognitively complex performance tests and found 
that between-task variance was the major facet, and significantly larger than between-student 
variance. Task variance, under traditional conceptualizations of reliability, is something to be 
“factored out” or “equalized” rather than incorporated into generalizability estimates. The 
problem with task variance “swamping” the variance associated with students creates a “Catch- 
22” for developers of performance assessments who create measures under different assumptions 
than those created by random sampling theory. 

Construct Centered Generalizability. Recent work in reliability theory proposes that in 
cognitively complex assessments, the distinction between “reliability” and “validity” becomes 
blurred. The same conclusion was reported in validity theory as well (Cronbach, 1988, 1989 ; 
Messick, 1989) with the “unitary” concept of validity. Nichols and Sugrue (1998) recommend 
that reliability analyses incorporate the complex cognitive assumptions so that performance 
assessments more faithfully represent the multidimensional constructs common to performance 
assessments. 

More specifically, with cognitively complex assessments, reliability or generalizability must be 
reconceptualized to reflect the theoretical expectations of the test developer as well as the 
complex thinking of the test takers (e.g. problem solving, communication of ideas; reasoning etc.) 
(Nichols and Smith, 1998). The logical implication for practice is that test developers should 
incorporate both the 1) theoretical expectations/constructs to be tested and 2) multi-trait, complex 
thinking required of the test takers into their reliability analyses. 

How Might Construct Centered Reliability Work? 

Overview of the study. Conceptually, a performance assessment score may be viewed as a 
sample of student performance drawn from a complex universe defined by a combination of all 
the admissible tasks, occasions, raters, and measurement methods. Generalizability theory refers 
to each of these dimensions as "facets". The task facet represents the content in a subject-matter 
domain; the occasion facet includes all possible occasions on which a decision maker would be 
equally willing to accept a score on the performance assessment. The rater facet includes all 
possible individuals who are trained to score the performance reliably. Typically, these three 
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facets are viewed as primary sources of error in a measurement procedure, especially a 
performance assessment. 

Traditional reliability analysis is concerned with the homogeneity of the tasks comprising an 
assessment. Assessments based on traditional measurement models are intended to measure a 
single phenomenon, or a ‘unitary trait’. The dependability of an assessment is assessed by a 
reliability or generalizability coefficient, which is a function of the degree of homogeneity of 
tasks and number of tasks. Within traditional reliability frame, a highly dependable assessment is 
comprised of a sufficiently large number of homogeneous tasks. The conditions to grant 
acceptably dependable assessments impose a difficult challenge to performance writing 
assessments, as it is not only inappropriate but also impossible to administer ‘a sufficiently large 
number of homogeneous writing tasks’ to a large number of students. 

Performance assessment differs from traditional assessment in that it claims to measures multi- 
traits simultaneously rather than consistently measuring a single trait. With a construct-centered 
reliability analytical approach, the reliability analysis should crystallize the multi-traits or 
constructs that the test specialists developed to measure from student performance, and then 
estimate the degree of fit between the theoretical expectations from test developers and the 
performance exhibited by the students. Hence, a good scoring rubric should reflect well the 
theoretical expectations or the multi-traits or constructs the assessment is developed to measure. 
Since one assessment is comprised of multi-traits rather than single traits, we should further 
decompose or unfold tasks into mini-tasks, which are the smaller and more homogeneous 
dimensions that comprise the multi-traits or constructs for each performance assessment. Since 
there are mini-tasks nested within the multi-traits, the level of reliability analysis should reflect 
this hierarchical structure. Further more, when reliability analysis incorporates the theoretical 
expectations from test specialists, more variance is defined systematically rather than left 
undefined as residue or error variance, thus, granting higher assessment dependability using 
traditional methods. 

The purpose of this study was to provide an empirical example of a construct-centered 
reliability analysis using writing performance assessment data from a large, urban school district. 
Approximately 31,645 writing assessments were taken in English during the spring of 1998 at 6 
grade levels. We used grade 3, 5, and 8 writing assessments in the analysis. 

Sample and Methods 

Sample. The district from which our study data were taken administers writing performance 
assessments in grades 3, 5, 6, 8, 10 and 1 1 each spring. The sample data used were from Spring 
98. The assessments are given to all students at the grade level and used for program evaluation 
purposes. Two prompts from the same writing domain are administered at each grade level. 
Students are randomly assigned to prompts. 

Table 1 displays the number of assessments, writing domains assessed in English and grade levels 
included in the study. 



Table 1 

Number of Student Writing Assessments in English by Grade Level 



Grade Level 


Writing Domain Tested 


Number of Assessments 


3 


Autobiographical Incident 


5488 


5 


Observation 


6009 


8 


Speculation about Cause-Effect 


5833 


TOTAL 




17330 
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The student population in our sample is quite diverse. The district is 54.8% Free and Reduced 
Price Lunch, 40.7% Hispanic, 20.2% African American, 14% Asian and, 19.4% white. The 
percent of English Language Learners ranges from 40.7% at the elementary level to 29% at 
Middle and 22.3% at high school. Table 2 summarizes the demographic characteristics of our 
sample. Because the entire population was sampled at the grade levels included in our study, the 
sample demographics represent the district profile quite well, except for the sample only included 
assessments in English, the proportion of English language learners is lower in the sample than 
district population (see Table 2, Student Sample Demographics). 

(Insert Table 2 here) 

The assessments. Students in grade 3, 5, and 8 took writing assessments that were intended to 
test specific domains with several simultaneous underlying constructs. Table 3 summarizes the 
constructs by grade level and writing domain (also see the scoring guide Table 4, 5, and 6 in 
appendix). 

Table 3 



Constructs By Writing Domain 



Grade 


Writing Domain 


Rhetorical Effectiveness 
Construct Measured 


Knowledge 


Conventions 

Constructs 

Measured 


3 


Autobiographical incident 


Incidence 
Context 
Voice and style 
Significance 


N/A 


Usage 


Spelling 

and 

Mechanics 


5 


Observation 


Identification of subject 
Context 

Observational stance 
Presentation of experience 


N/A 


8 


Speculation about causes 
and effect 


Presentation of situation 
Logic and relevance of causes 
Elaboration of argument 


Knowledge 



Chart 1 explains the relationship between the domain tested for grade 3, 5 and 8 and the 
constructs underlying the domains as reflected in the scoring guide. 

— (Insert Chart 1 here) — 

Scoring. Teachers were released district wide for one day in late May to score student 
assessments and were asked at the end of the scoring day to summarize the learning issues they 
encountered in the papers. Teachers were assigned to rate papers at their own grade level and 
trained prior to rating on two scoring guides, one for Rhetorical Effectiveness and one for 
Conventions (see Appendix for copies of the scoring guides). The rater training consisted of one 
to two hours of guided rating and discussion managed by a district-trained and experienced Table 
Leader. The Table Leader discussed the prompt and scoring guide, had teachers write to the 
prompt (depending upon the experience of the group being trained), and facilitated the rating 
“anchor papers” (previously selected and scored papers that represent the score points in the 
Rhetorical Effectiveness and Conventions scoring guides). Each student received one score on 
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Rhetorical Effectiveness and one on Conventions. Scores were from 1 to 6, with 6 being the 
highest. Once the Table Leader was satisfied that the teachers reached consensus about the 
scoring guides, rating began. Papers from each of the districts 62 elementary and 21 middle 
schools were randomly assigned to tables of teachers. Teachers gave each paper a first read. 
During the first reading, Table Leaders “read behind’’ teachers and if a teacher was straying from 
the scoring guides, she was pulled aside for re-training. Papers were then collected and randomly 
assigned to another set of teacher raters for a second reading. Table Leaders checked the second 
reads. When scores were more than two points apart, the Table Leader gave the paper a third 
reading and that “expert” score became the score reported to the student. 

Teachers assigned one score for Rhetorical Effectiveness and one for Conventions. In 
determining scores, teachers considered grade-level specific constructs defined by numerous 
dimensions. In the third grade writing domain of autobiographical incident, for example, 
rhetorical effectiveness was divided into four constructs: incidence, context, voice and style, and 
significance, and the conventions was divided into two constructs: usage, spelling and mechanics. 
The scoring guide directed teachers to evaluate incidence along 34 dimensions; context by 1 1 
dimensions; voice and style by 18 dimensions and significance by 16 dimensions. 

JJ Grade 3 data 

For grade 3, the domain tested is AUTOBIOGRAPHICAL INCIDENT. The two prompts used 
were Form A.* A Time I Learned How To Do Something, and Form B: A Time I Didn V Give Up. 
The four underlying constructs to measure AUTOBIOGRAPHICAL INCIDENT are incidence, 
context, voice and style, and significance. According to the scoring guide, there are 34 
dimensions to measure the construct of incidence. 

(Insert Chart 2 here) — 

From the chart, we can assume that a score of 6 implying that a student has met the 1 1 
dimensions (IQl to IQl 1) developed according to the scoring guide. Likewise, for a score of 5, 
we can assume that a student has met the 4 dimensions (IQ12 to IQl 5), and etc. Chart 3, 4, and 5 
shows the dimensions indicating the construct of context, voice and style, and significance 
separately. 

(Insert Chart 3, 4, and 5 here) 

2) Grade 5 data 

The domain tested for grade 5 is OBSERVATION. The two prompts used were: Form A: A 
Rainy Day, and Form B: Watching an Animal. The four underlying constructs to measure 
OBSERVATION are identification of subject, context, observational stance, and presentation 
of the experience. According to the scoring guide, there are 5 dimensions to measure the 
construct of identification of subject, 6 dimensions to measure context, 7 dimensions to measure 
the construct of observational stance, and 18 dimensions to measure presentation of the 
experience. For each score category from 1 to 6, there are corresponding dimensions developed to 
reflect if a student has met those dimensions. Chart 6, 7, 8 and 9 shows the dimensions indicating 
the 4 constructs 

(Insert chart 6, 7, and 8 here) 

3) Grade 8 data 
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SPECULATION ABOUT CAUSES AND EFFECT is the domain tested for grade 8. The two 
prompts used were Form A: Lewis and Clark, and Form B: Overland Trails to the West. The 
three constructs underlying to measure the domain were presentation of the situation, logic and 
relevance of causes, and elaboration of argument Chart 10, 11, and 12 show there are 36 
dimensions are developed to measure the construct of presentation of the situation, 40 dimensions 
to measure logic and relevance of causes, and 3 1 dimensions to measure elaboration of argument. 

(Insert chart 10, 11, and 12 here) 

Analytical Procedure 

Re-sampling. Each grade, approximately 5,000 to 6,000 students were involved in the 
assessments. We randomly sampled 3-4 % assessments from the entire database three times with 
replacement before we obtain all the parameter estimates for analysis. 

Generalizability analysis. We employed generalizability theory (Cronbach, Closer, Nanda, & 
Rajaratnam, 1972) to analyze the dependability of the assessments. Dependability usually refers 
to the accuracy of generalizing from a person’s observed score on a test to the average score that 
person would have received under all the possible conditions that the test user would be equally 
willing to accept. The strength of G theory is that multiple sources of error can be estimated 
separately in a single analysis. We designed G study with two facets to focus on the variance 
components due to constructs comprising writing domains based on observed scores on rhetorical 
effectiveness and conventions, and the variance components due to constructs and rater 
interactions. 

For grade 3, 5, and 8, we designed a (2x2) two facets fully crossed design, with two constructs 
(rhetorical effectiveness and conventions) and two raters. For grade 8, although knowledge was 
one of the constructs underlying the domain tested, we consider it a different construct from 
rhetorical effectiveness and conventions since it measures mastery of historical knowledge. 

Table 4 shows the generalizability analysis design for grade 3 and 5, and 8. 



Table 4: G Study Two Facets Fully Crossed Design for Grade 3, 5, and 8: 



Student 


Rater 


Rhetorical 


Convention 




1 


score 


score 




2 


score 


score 



Results and Evidence 

The estimates of variance components from separate sources of variation are shown in Tables 6, 7 
and 8. For grade 3, the variance due to construct is 10%, and the variance due to the interaction 
between constructs and raters is around 7%. For grade 5, the variance due to construct is above 
7% and the variance due to the interaction between constructs and raters amounts to around 20%. 
For grade 8, even larger proportion of variance due to constructs is observed to be around 
48.49%. The variance due to the interaction between constructs and raters is around 2.18%. It is 
noticeable that the variance due to rater is very small. 
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G coefficients for each grade were derived to estimate the proportion of expected-score variance 
that is the universal score variance. For grade 3, the range of the G coefficients for samples A, B, 
and C were from .747 to.788; for grade 5, the range of the G coefficients for samples A, B, and C 
were from .677 to .763; for grade 8, the range of the G coefficients for samples A, B, and C were 
from .809 to .821 . Except for the G coefficient for grade 5 sample B was a little lower, the rest of 
the G coefficients fall into acceptable range. 

Table 6: Estimates of Variance Components for G Study Two Facets Fully Crossed Design 

(Grade 3) 





Estimated Variance Components 


Source of Variation 


Sample A 
(N= 228) 


Sample B 
(N=213) 


Sample C 
(N=213) 


Overall . 


Average 


Student (Universal Score Variance) 


0.9093 


0.8948 


0.8495 


0.8845 


48.38% 


Construct 


0.2049 


0.3148 


0.0570 


0.1928 


10.54% 


Rater 


0.00048 


0.0002 


0.001 


0.0005 


0.21% 


Construct*Rater 


0.014 


0.014 


0.365 


0.13 


7.1% 


Student*Construct 


0.15 


0.1475 


0.163 


0.1531 


8.37% 


Student* Rater 


0.25 


o.m 


0.318 


0.28 


15.31% 


Student*Rater*Construct 


0.181 


0.189 


0.192 


0.1873 


10.24% 


Total Variance 


1.70968 


1.8373 


1.9455 


1.8282 


100% 


G Coefficient 


0.788 


0.775 


0.747 







Table 7: Estimates of Variance Components for G Study Two Facets Fully Crossed Design 

(Grade 5) 



Source of Variation 


Estimated Variance Components 


Sample A 
(N= 227) 


Sample B 
(N=231) 


Sample C 
(N=224) 


Overall . 


Average 


Student (Universal Score Variance) 


0.7175 


0.5450 


0.5760 


0.61 


36.9% 


Construct 


0.1130 


0.0520 


0.1923 


0.12 


7.26% 


Rater 


0.002 


0.004 


0.001 


0.003 


0.18% 


Construct*Rater 


0.12 


0.46 


0.45 


0.34 


20.6% 


Student*Construct 


0.18 


0.20 


0.15 


0.18 


10.89% 


Student* Rater 


0.18 1 


0.22 


0.23 


0.21 


12.7% 


Student*Rater*Construct 


0.17 


0.204 


0.185 


0.19 


11.49% 


Total Variance 


1.4825 


1.685 


1.7843 


1.653 


100% 


G Coefficient 


0.763 


0.677 


0.709 
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Table 8: Estimates of Variance Components for G Study Two Facets Fully Crossed Desien 

(Grade 8) 



Source of Variation 


Estimated Variance Components 


Sample A 
(N=235) 


Sample B 
(N=235) 


Sample C 
(N=242) 


Overall Average 


Student (Universal Score Variance) 


1.0039 


.9854 


0.9388 


0.976 


32.3% 


Construct 


0.0668 


0.0956 


4.2341 


1.465 


48.49% 


Rater 


0.001 


0.002 


0.000 


0.001 


0% 


Construct*Rater 


0.026 


0.019 


0.154 


0.066 


2.18% 


Student*Construct 


0.092 


0.132 


0.138 


0.12 


3.92% 


Student*Rater 


0.270 


0.257 


0.204 


0.244 


8.1% 


Student*Rater*Construct 


0.150 


0.153 


0.142 


0.148 


4.9% 


Total Variance 


1.6079 


1.644 


5.8109 


3.021 


100% 


G Coefficient 


0.821 


0.809 


0.820 







Discussions 

The data used for this analysis contains only one writing task for each student. Not only it is 
inappropriate, but also impossible to assess the reliability or dependability of this assessment 
within traditional reliability analytical framework. However, the assessments were designed 
focusing on the constructs underlying domains tested. Meanwhile, scoring guides were developed 
centering the constructs comprised of dimensions for raters to consider. Raters were trained to 
follow the construct-centered-scoring guide. To assess the dependability of the assessments 
involved, by incorporating contextual information provided in the scoring guide, we were able to 
decompose the one complex writing task into a number of smaller constructs, which were 
comprised more homogeneous dimensions. The rationale guiding the approach centered focus on 
the constructs that were intended to be measured by the domain tested. Thus, we differentiate this 
approach from traditional reliability analytical approach. 

Looking at the results of generalizability analysis obtained using grade 3, 5 and 8 data, we found 
that high generalizability or reliability is achievable using construct centered reliability analytical 
approach where there is a high degree of fit between the substantive expectations generated from 
the test specialist’s understanding of the construct-as realized by the measurement procedure-and 
observation (Nichols and Smith, 1998). 

Educational or Scientific Importance of Study: 

Instead of increasing number of writing prompts or writing tasks per student, we incorporated 
contextual information when we performed generalizability analysis by centering our focus on 
underlying constructs of domain tested, thus defining more variance systematically. The evidence 
from this study support the need for reliability studies to incorporate theories of learning and 
performance and that this incorporation has practical implications for the use of different 
assessment practices. Test developers have an obligation to accommodate different sets of 
substantive assumptions into the evaluation of measurement procedure’s reliability (Nichols & 
Smith, 1998). The construct centered reliability analytical model serves an alternative to the 
traditional reliability analytical model when assessing the dependability of cognitively complex 
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task. This study provides an empirical example of a framework for contexualizing the 
interpretation of reliability data, which is critical to validate performance assessment. 
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Dimension Descriptions 


Coherent and engaging story 


Moves toward central moment with drama | 


Tells readers what they need to know to understand what happened 1 


Readers can infer the incident's significance to the writer | 


Sensory descriptions 1 


Narrating specific action I 


Creating dialogues I 


Slowing the pace to elaborate central moment in the incident | 


Creating suspense or tension | 


Including the element of surprise I 


comparing or contrasting other scenes or people | 


Less drama than a "6” 1 


structurally more predicatble than a "6" | 


Less focused, especially toward the end | 


Uses a narrower range of narrative strategies | 


Lacks the authority of a "5" I 


May be momentary digreessions | 


story may be smoothly told yet unrealized dramatically. 1 


iLimited use of narrative strategies | 


1 Related specific incident | 


1 story competently told I 


1 Brief ' I 


iFlat, unfocused I 


|May be series of loosely connected events | 


IVery limited use of narrative strategies | 


|May fail to focus on an incident | 


1 Incohesion. May tell an incident without orienting context or significance | 


1 Usually quite brief. I 


1 If longer, may be rambling, fragmentary, or without details. | 


IContains omissions, erratic jumps in time or place, or breakdowns 


|May refer to an Incident without identifying it specifically | 


iMay only imply the incident I 


|May point to an incident without developing it conclusively 


1 Writer may focus on others instead of him/self | 


1 Writer locates incident in a particular setting and orients reader to scene, people and events 


1 Carefully chosen details used to develop the scene or the people 


Considerable space devoted to orienting readers, describing the scene and people, and providing back-ground or context 
for the incident but not at the expense of a well-told incident 


(Balance achieved between static context and dramatic narrated incident 


(Appropriate and adequate context as in a 6-point essay 


1 Context does not dominate at the expense of the incident 


(Adequate to orient readers to the incident 


(May devote too much space to context while neglecting the narrative 


(May begin abruptly without necessary orientation 


IContext Is limited or even missing (1) 


(Context Is limited or even missing (2) 
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Dimension Descriptions 


Authentic voice 1 


Reveals writer’s attitude toward the incident I 


Well chosen details I 


Appropriate words I 


Graceful, varied sentences | 


Often includes word play and imagery I 


Engages the reader from the beginning and moves to a satisfying closure I 


Competent stylistically I 


May lack the grace, surprise, or sparkle of a "6" I 


1 Begins engagingly and closes in a satisfying way I 


The voice of an earnest story-teller I 


Predicatable sentences and word voice I 


Writer relates incident in an uninvolved way | 


Writer does not seem to be seeing the Incident as it happened I 


Minimal evidence of personal involvement I 


1 Writer doe not seem to be relating specific details about the incident | 


[Sentences may be too short or long In a disorderly way I 


[Writer communicates little or no evidence of personal involvement in the Incidence [ 


\ Reveals by statement of implication, why the incident was important to the writer | 


Significance may be apparent in the writer's insights at the time of the incident or in reflections from his/her present 
perspective. 


1 Insights/reflections may appear integrated into the narration or In the conclusion 


[Reflections may be humorous 


jSignifance either implied or stated clearly, through remembered or present reflections | 


[Reflections not as perceptive as 6-point essay, but not superficial ' 


1 Less well integrated as 6-point essay; often at end of essay 


[Either implied or stated 


[Reflection not as insightful as 5-point essay 


[Reflection may seem tacked on at the end of the essay 


[implied or briefly stated 


[Gives readers an idea why the incident was memorable 


[Reflections not especially insigniful 


[Few, if any, reflections 


[Reflections may seem superficial 


[Little or no significance implied or stated 
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Dimension Descriptions 


Clearly defines, identifies or describes the situation to be speculated about | 


Situation is presented fully and precisely, but does not donninate the essay at the expense of speculation 


Writer limits the occasion appropriately; readers’ attention is focused on just those aspects the writer will speculate about | 


The presentation of the situation grounds and focuses the entire essay | 


Reader is immediately engaged | 


Language is conrete, rich in sensory detail | 


Writer uses narrative and descriptives strategies | 


Writer acknowledges readers; concems/questions. | 


Writer convinces readers of the plausibility of the speculation. I 


I For real-world situations, the writer acknowledges reader’s experience or familiarity with a situation, then builds on this to focus reader 
attention on a comparable situation 


Writer establishes authority by consistently demonstrating broad knowledge and clear understanding of the situation | 


Situation is clearly defined, but \A^h less elaboration than a "6" | 


Situation does not dominate the essay at the expense of speculation | 


Writer limits and focuses the occasion, but with less panache than a "6" | 


Writer's knowledge and understanding of the situation is clear throughout, and a sense of confidence and authority is maintained 


I Language lacks only the vividness and impact of a "6“ | 


I Relies on a narrower range of strategies for presenting the situation than a "6" | 


I Situation presented with less assuredness than a "5" or "6" | 


I Situation may tend to dominate the essay | 


I Not as clearly focused, or may lack detail or specifically or a "5" or "6" | 


|The presentation of the situation is adequate to orient readers to the proposed causes or effects | 


1 Essay contains some explicit speculation | 


1 Writing presents a situation I 


1 Situation may either be brief or may dominate the essay | 


1 Writer may paraphrase the prompt rather than define the situation | 


1 Writer may not clearly establish the boundaries of the situation ! 


1 Writer may not seem to fully understand the situation 


1 Commonplace language 


1 Limited use of strategies 


{Writer may not acknowledge readers 


Writer may attempt to construct a situation but, because of omissions, erratic jumps in time or place or breakdowns in cohesion, will not 
establish focus. 


I Situation may dominate the essay 


[Essay may include no occasion, begining abruptly with a list of causes or effects 


I Writer exhibits only minimal understanding of the situation 


I If there is a situation, it will be very brief and devoid of speficity or conretness 


Essay may point vaguely to a situation without focusing or establishing boundaries 
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Dimension Descriptions 


Proposes causes and effects are clearly related to the situation | 


Writer uses imaginative, investigative, inventive arguments to convince readers of the logic of the speculations I 


Multiple perspectives/possibilities are considered | 


Writer stretches imagination to take ideas as far as possible, are convinced that the vvriter's speculations are plausible and appropriate 
to the situalations as defined 


Writer maintains focus by establishing and continually developing the close relationship between the particular situation and the 
causes/effects that might arise from it 


Writer is continually aware of reader's needs | 


Writer may employ some of the following strategies: | 


Building a succession of causes of effects, each changing the complex: | 


Writer establishes, maintains, and developing a plausible relationship between the situation and each of the processed causes or 
effects 


The speculation is naturally linked to the situation. I 


Transitions skillfully keep the reader grounded both in the relationship between the situation and the proposed causes and effects and in 
the logical development and progression of the speculation itself 


Writer uses the transitions to carry the reader along with the methodlogical development of the argument | 


Writer weaves together facts, opinions & projections to create and develop convincing reasons | 


I Proposed causes or effects are [inked naturally to the defined situation | 


I Writer concjectures persuasively for possible causes or effects | 


I Speculations are serious and logical, lacking only the freshness and imagination of a "6". | 


I Obviously statements about probable causes and effects to speculations that are not entirely predictable. | 


I Essay may lack the clarity of focus, the continuity, or the grovring insight and fullness of a “6". | 


I Writer has a consistent awareness of audience | 


I Speculations are insightful, but not as probing as a “6” | 


Writer keeps the reader grounded in both situation and the speculation and the speculation, although not as consistently as a "6" 


Writer establishes a connection between the situation and the postulated causes or effects, but may not maintain this connection as 
explidtily or effectively as "5” or "6". 


Speculations characterized by thoughtfulness rather than inventiveness 


I Proposed causes or effects may be logical but predicatable | 


I Speculations may be connected | 


I Acknowledgment of readers not as evident as in a "5” or "6” | 


[Speculations are at least tangentially relevant. | 


I Writer may tend to list a series of causes or effects rather than develop them or ground them in the situation. | 


[Little effort to convince the reader by developing a logical cause-effect relationship. | 


[Speculations may seem obvious, superfildal, or predicatable. | 


Proposed causes/effects arise from or are appropriate to the situation, but may seem tangential and not grounded as firmly in the 
situations. 


1 Little conscious awareness of the reader. | 


Essay may have a meandering quality 


[May be only one minimally developed cause/effect | 


1 If there is a situation, the speculations may be either brief or meandering/unfocused 


1 Little evidence of any logical organization 


[Some of the proposed causes/effects may seem illogical or unrelated to the situation 


1 Little connection between the situation and the speculations 


1 If there are speculations, they are brief and superficial attempts at prediction rather than considered exploration of possibilities 


1 No evidence of a logical connection between the situation and the causes or effects. 
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Dimension Descriptions 


[Essay provides substantial elaboration. j 


[Reader is convinced that the writer's conjectures are valid for the situations [ 


[Writer uses carefully chosen evidence that is logically and fully developed. | 


[Reader is convinced of both the logic adn the authenticity of the proposed cause/effect. | 


[Strategies used to develop arguments: | 


[Writer may mention several possible causes/effects, developing and linking them. | 


[Writer may only mention one cause/effect, building It fully adn examining it dosely from a vaiiatey perspectives. | 


[Writer makes a full and convindng argument for at least one postulated cause and one postulated effect. | 


[Writer engages in extended, thoughtful speculation | 


[Writer uses effective arguments to convince the reader of the logic adn validity of the speculations. | 


[Writer chooses evidence that is relevant and convindng. | 


[Supporting evidence is more predictable than a "6". | 


[Supporting details are relevant and convindng: not as richly developed as a "6". | 


I Writer offers less persuasive evidence for the validity of the proposed causes/effeds. | 


[Essay exhibits some internal logic and an over-all sense of organization. | 


[Essay may not show a consistent relationship between the situation and cause/effects. | 


[May be some irrelevant details. | 


[Elaboration Is limited - perhaps to a brief explanation of one cause/effed or a listing of several with minimal development. I 


[Essay lacks consistency of development of details. j 


[Sequence and organizational pattern seem undear. | 


[Essay may seem generally competent and the speculations interesting | 


[Little elaboration, often merely listing. | 


[ Essay often brief. | 


[ Little development either of the situation or of cause/effeds. | 


[May be extended generalized rambling. | 


[May merely list causes/effeds without support of argument. | 


Some details may be irrelevant and unconneded to either the speculations or the situation; Essay contains little or no argument or effort 
to persuade the reader. 


[ Little or no elaboration of either the situation or of the causes/effeds. j 


[Speculations, if presented at all, are not argued. | 


[Rarely is there any sense of the reader. I 


[Essay is brief; Essay is often not coherent. | 
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Relationship between Domain Tested and the Underlying Constructs 

Grade 3 Domain Tested: 




Grade 5 Domain Tested: 




Grade 8 Domain Tested: 
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Chart 2 



Dimensions indicating the construct of incidence (grade 3) 
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Chart 3 



Dimensions indicating the construct of context (grade 3) 
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Chart 4 



Dimensions indicating the construct of voice and style (grade 3) 
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Charts 



Dimensions indicating the construct of signiflcance (grade 3) 
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Chart 6 



Dimensions indicating the construct of identification of subject (grade 5) 
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Chart 7 



Dimensions indicating the construct of context (grade 5) 
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Chart 8 



Dimensions indicating the construct of observational stance (grade 5) 
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Chart 9 



Dimensions indicating the construct of presentation of the experience (grade 5) 
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Chart 10 



Dimensions indicating the construct of presentation of the situation (grade 8) 
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Chart 11 



Dimensions indicating the construct of logic and relevance of causes (grade 8) 
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Chart 12 



Dimensions indicating the construct of elaboration of argument (grade 8) 
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WRITING ASSESSMENT 

Conventions Scoring Guide 
for 

/ ALL TYPES OF WRITING 



score points 


Criteria 


6 

Distinguished 

Achievement 


Reader rarely spots errors in conventions 

• Usaae: Wnter demonstrates a thorouah command of written Enalish. 
There may be an occasional lapse due to experimentation with complex 
ideas and styles. 

• Mechanics and SDellina: Writer demonstrates command of mechanics 
and spelling. Minor errors expected in first-draft writing may occur, but. 
they are rare and do not take away from the effectiveness of the writing 
style. 


5 

Noteworthy 

Achievement 


Reader seldom spots errors in conventions 
Writer may have difficulty with some more sophisticated conventions 

• Usaae: Writer demonstrates control of accepted usaae: few errors are 
noticeable 

• Mechanics and Soellina: Writer demonstrates control of mechanics and 
spelling but may commit a few errors repeatedly. These do not detract 
from the overall impression of the essay. 


4 

Satisfactory 

Achievement 


Reader sometimes spots errors in conventions 

• Usaae: Writer aives aeneral evidence of an understanding of common 
usage while committing more than one kind of error. The reader may be 
aware of these errors but they do not hinder understanding. 

• Mechanics and Soellina: Writer demonstrates aeneral control of 

mechanics and spelling but gives evidence of different kinds of errors 
throughout the essay. The essay may nevertheless be read vwth relative 
ease and few distractions. 


3 

Some Indication of 
Achievement 


Reader finds numerous errors in conventions 

• Usaae: Writer demonstrates some control of the conventions of usage 
but frequent errors appear. These usage problems may cause 
misunderstanding on the part of the reader because the meaning is 
unclear. 

• Mechanics and Spelling: Writer shows some control of mechanics and 
spelling but commits many errors repeatedly. The errors that occur 
interrupt the flow of ideas, which may confuse the reader. 


2 

Limited Indication of 
Achievement 


Reader is continually aware of errors in conventions 

• Usaae. Writer demonstrate limited understanding of the convention of 
usage. The frequency and seriousness of usage problems may result in a 
lack of understanding in some areas, but most of the essay can be . read 
with comprehension of the writer's intent. 

• Mechanics and Speilina: Writer demonstrates little control, committing 
frequent and serious errors in mechanics and spelling. These errors 
cause misunderstanding and confusion on the part of the reader 


1 

Few Indications of 
Achievement 


Reader is disturbed by repeated errors in conventions 

• Usaae: Writer demonstrates very little control of conventions of usage, or 
the piece may be so limited that there is little on which to base a 
judgment. Essay may be unintelligible. 

• Mechanics and Soellina: Writer demonstrates little ability in mechanics 
and spelling. Serious errors may occur vwth regularity, or the piece rriay 
be too brief to evaluate the writer's level of proficiency with English 
language conventions. 



OFF TOPIC papers are scored for conventions only 
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